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1.  INTRODUCTION 

Daring  the  period  of  this  grant  we  used  the  equipment  bought  under  this  proposal  to 
support  research  on  extendible  type  systems  for  data  base  use,  main  memory  data  bases 
and  a  new  methodology  for  connecting  application  programs  to  data  base  systems.  We 
will  report  on  the  research  in  these  areas  in  the  three  sections  which  follow. 

2.  EXTENDIBLE  TYPE  SYSTEMS 

Data  bases  must  be  designed  to  support  the  diverse  data  types  found  in  non-business 
data  processing  applications.  Such  data  elements  include  test,  bit  maps,  polygons,  spatial 
objects  and  time.  As  noted  in  [STON83]  such  objects  are  very  inefficient  to  support  on 
top  of  the  built-in  data  types  supplied  by  existing  data  base  systems.  What  is  needed  is  a 
mechanism  to  add  user  defined  data  types  to  a  data  base  system.  What  follows  is  a  brief 
sketch  of  the  solution  we  have  explored.  A  full  treatment  of  this  topic  appears  in 
(STON80). 

A  user  can  create  a  new  data  type  by  indicating  to  the  DBMS: 

The  name  of  the  type 

A  procedure  to  convert  an  ASCII  dots  element  to  one  of  the  indicated  type 
A  procedure  to  convert  an  element  from  the  indicated  type  to  ASCII 
The  storage  length  of  the  type 

With  this  information,  data  elements  of  the  new  type  can  be  stored  in  the  data  base  and 
retrieved.  To  achieve  real  leverage,  operators  and  functions  should  also  be  definable  on 
the  new  type.  We  will  discuss  operators  first.  Query  languages  support  unary  and  binary 
operators,  and  current  query  optimiters  can  generate  optimised  plans  for  query  language 
expressions  containing  such  operators  The  main  problem  to  be  solved  is  that  of  con¬ 
structing  a  query  optimiscr  that  can  successfully  process  query  language  expressions  with 
user  defined  operators. 

The  solution  is  to  convert  a  query  optimilrr  to  be  table  driven  and  to  require  each 
operation  that  is  defined  to  specify  the  required  information  We  will  indicate  the 
approach  with  a  brief  example  Most  data  base  systems  include  merge-sort  as  an 


optimization  tactic  for  queries  that  include  joins.  Using  merge-sort,  both  relations  are 
sorted  on  the  field  used  in  the  join  expression  and  then  the  sorted  relations  arc  merged  to 
produce  the  answer.  A  current  query  optimizer  has  built  into  it  the  knowledge  that  ”«=" 
is  a  join  operator  for  which  merge-sort  is  usable  and  that  both  relations  must  be  sorted 
using  the  "  <"  operator.  An  optimizer  that  processes  extendible  operators  must  be  able  to 
know  for  each  operator  whether  merge-sort  is  a  usable  tactic  and  what  sort  operator  to 
use  in  the  algorithm.  This  information  can  be  specified  when  each  operator  is  defined. 
Then,  the  run-time  system  must  be  able  to  process  merge-sort  by  calling  a  sort  package 
which  can  sort  using  any  operator.  The  other  changes  that  a  DBMS  optimizer  must 
undergo  are  specified  in  (STON86],  and  an  implementation  of  these  ideas  is  underway 
(STON80A). 

The  second  thing  which  an  extendible  type  system  requires  is  the  possibility  of  user 
defined  indexes.  Current  DBMSs  give  users  B-trees  and  sometimes  hashing  as  access 
methods.  Such  methods  are  ideal  for  business  data  processing  applications,  but  fail  com¬ 
pletely  when  presented  with  spatial  data.  A  DBMS  must  thereby  be  capable  of  adding 
user  definable  access  methods.  Candidates  for  spatial  data  include  R-trees  [GUTM84]  and 
K-D-B-trees  [ROB181], 

The  main  problem  in  supporting  new  access  methods  is  instructing  the  query  optim¬ 
izer  on  the  performance  of  such  new  paths  to  data.  One  must  be  able  to  tell  the  optimizer 
what  form  of  expressions  the  access  methods  can  process  and  what  the  expected  cost  of 
any  such  expression  is.  The  solution  is  to  include  the  expressions  in  a  compact  notation 
when  any  given  access  path  is  created.  The  expected  cost  of  any  expression  is  indicated 
by  entries  in  appropriate  system  catalogs.  The  query  optimizer  can  then  look  in  the  cata¬ 
logs  for  needed  information.  Complete  details  of  this  technique  are  given  in  [STON86], 

3.  MAIN  MEMORY  DATA  BASES 

In  engineering  applications  one  frequently  browses  a  large  amount  of  related  data. 
Hence,  the  plummeting  cost  of  semiconductor  memory  makes  it  increasingly  possible  to 
put  a  data  base  entirely  in  primary  memory.  As  part  of  this  research  project  we  investi¬ 
gated  techniques  to  apply  main  memory  data. 

First,  we  investigated  how  well  B-trees  performed  when  compared  with  structures 
exclusively  designed  for  main  memory  use  such  as  balanced  binary  trees.  When  data  fits 
entirely  in  main  storage,  binary  trees  were  shown  to  be  superior;  however,  their  perfor¬ 
mance  degraded  rapidly  when  insufficient  main  storage  was  available  for  the  size  of  the 
data  set.  We  concluded  that  main  memory  oriented  structures  were  only  suitable  if  virtu¬ 
ally  the  entire  data  set  fits  in  main  memory.  The  details  of  this  analysis  were  presented  in 
(DEWI84). 

The  second  aspect  of  main  memory  which  we  investigated  was  join  tactics  that 
could  be  used  with  large  amounts  of  main  memory.  We  analyzed  several  algorithms, 
including  conventional  range  sort,  a  basic  join  algorithm  based  on  hashing,  and  two  algo¬ 
rithms  based  on  partitioned  hashing.  Our  analysis  showed  that  when  a  small  amount  of 
main  memory  was  available,  the  merge-sort  was  the  clear  winner.  When  storage  was 
available  for  the  smaller  of  the  two  relations,  then  any  of  the  hash  join  algorithms  outper¬ 
formed  merge  join  by  a  substantial  margin.  Between  those  two  amounts  of  memory,  par¬ 
titioned  hashing  was  shown  to  be  advantageous.  Again  the  details  of  the  analysis  appear 
in  (DEW184). 
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The  final  aspect  of  a  main  memory  data  base  system  which  we  studied  was  the  pro¬ 
cessing  of  data  base  logs.  In  a  main  memory  environment,  the  I/Os  associated  with  the 
log  may  be  one  of  the  most  expensive  aspects  of  transaction  processing.  First,  we  investi¬ 
gated  the  possibility  of  processing  a  whole  collection  of  transaction  commit  operations  as  a 
group.  This  "group  commit”  concept  was  shown  to  dramatically  enhance  performance 
and  was  independently  investigated  by  the  designers  of  the  IMS/Fast  Path  System.  We 
also  studied  the  speedups  available  by  utilising  persistent  main  memory,  i.e.,  memory  that 
does  not  fail  when  the  power  is  interrupted  or  a  crash  occurs.  It  was  shown  that  moving 
the  log  to  persistent  main  memory  was  an  advantageous  tactic. 

4.  INTERFACE  TO  AN  APPLICATION  PROGRAM 

Conventional  data  base  systems  use  the  notion  of  a  "cursor”  to  interface  between  a 
data  manager  and  an  application  program.  An  application  can  define  a  query  and  then 
iterate  over  the  records  that  it  produces  using  this  construct.  However,  especially  in 
browsing  applications  the  cursor  concept  has  at  least  three  disadvantages.  First,  cursors 
only  allow  an  application  to  retrieve  records  in  a  forward  direction.  After  a  record  has 
been  fetched,  there  is  no  easy  way  to  move  backwards  and  fetch  it  again  as  might  be  done 
by  a  typical  browsing  application.  The  second  problem  is  that  only  one  record  can  be 
fetched  at  a  time.  Here  an  application  that  fills  up  a  screen  with  20  or  30  records  must 
fetch  them  one  by  one.  The  last  disadvantage  is  that  once  a  query  is  specified,  there  is  no 
way  to  dynamically  restrict  its  scope.  For  example,  while  browsing  the  employees  whose 
age  is  less  than  40,  one  cannot  search  forward  to  an  employee  named  Smith  except  by 
scanning  all  records  one  by  one  and  searching  for  the  qualifying  name.  This  makes 
browsing-oriented  programs  difficult  to  write  and  subject  to  bad  performance. 

We  have  designed  a  new  interface  based  on  the  concept  of  a  portal.  A  portal  func¬ 
tions  much  like  a  "tall  cursor,”  i.e.,  one  for  a  which  a  variable  number  of  records  are 
returned  on  each  call.  Portals  allow  forward  and  backward  scans  as  well  as  forward  and 
backward  search  operations.  As  such  they  are  a  much  better  base  on  which  to  build 
browsing  oriented  application  programs.  A  completed  proposal  of  the  portal  concept 
appears  in  (STON84)  and  we  are  implementing  a  variant  of  this  proposal  in  [STON86A], 
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